PAC Associative Reinforcement Learning

نویسنده

  • Claude-Nicolas Fiechter
چکیده

General algorithms for the reinforcement learning problem typically learn policies in the form of a table that directly maps the states of the environment into actions. When the state-space is large these methods become impractical. One approach to increase e ciency is to restrict the class of policies by considering only policies that can be described using some xed representation. This paper pursues this approach and analyzes the associative reinforcement learning problem in the PAC learning framework. As a representation, we use a general form of decision lists that can describe a wide variety of restricted classes of policies. We then describe an algorithm that provably learns, with high probability, a good approximation of the optimal policy for any environment that satis es a particular adequacy condition stipulated for the representation used. The running time of the algorithm is polynomial in the size of the representation in addition to the usual parameters for a PAC algorithm. We give some experimental results that show that the algorithm performs well in practice. 1 Supported in part by the NSF grant CCR-9202158 and by an Andrew Mellon predoctoral fellowship at the University of Pittsburgh.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Polynomial Time PAC Reinforcement Learning with Rich Observations

We study the computational tractability of provably sample-efficient (PAC) reinforcement learning in episodic environments with high-dimensional observations. We present new sample efficient algorithms for environments with deterministic hidden state dynamics but stochastic rich observations. These methods represent computationally efficient alternatives to prior algorithms that rely on enumera...

متن کامل

PAC-Bayesian Policy Evaluation for Reinforcement Learning

Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first...

متن کامل

Lower PAC bound on Upper Confidence Bound-based Q-learning with examples

Abstract Recently, there has been significant progress in understanding reinforcement learning in Markov decision processes (MDP). We focus on improving Q-learning and analyze its sample complexity. We investigate the performance of tabular Q-learning, Approximate Q-learning and UCB-based Q-learning. We also derive a lower PAC bound Ω( |S| |A| 2 ln |A| δ ) of UCB-based Q-learning. Two tasks, Ca...

متن کامل

Reinforcement Learning in Finite MDPs: PAC Analysis Reinforcement Learning in Finite MDPs: PAC Analysis

We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the well-known E and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. We also present a...

متن کامل

UBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform vers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995